Libraries

Link to dataset https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales

Context

The growth of Supermarkets in most populated cities are increasing and market competitons are also high. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches located in 3 different cities in Myanmar for 3 months data. Predictive data analytics methods are easy to apply with this dataset.

Data Dictionary

Invoice id: Computer generated sales slip invoice identification number

Branch: Branch of supercenter (3 branches are available identified by A, B and C).

City: Location of supercenters

Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.

Gender: Gender type of customer

Product line: General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel

Unit price: Price of each product in $

Quantity: Number of products purchased by customer

Tax: 5% tax fee for customer buying

Total: Total price including tax

Date: Date of purchase (Record available from January 2019 to March 2019)

Time: Purchase time (10am to 9pm)

Payment: Payment used by customer for purchase (3 methods are available – Cash, Credit card and Ewallet)

COGS: Cost of goods sold

Gross margin percentage: Gross margin percentage

Gross income: Gross income

Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)

Purpose

This dataset can be used for predictive data analytics purpose.

Task 1 : Initial Data Exploration

Task 2: Univariate Analysis

Question 1: What does the distribution of customer rating look like? Is it skewed?

Answer 1: Not skewed, uniformly balanced.

Question 2: Do aggregate sales numbers differ by much between Branches?

Answer 2: No, they are not much different by branches.

Task 3: Bivariate Analysis

Question 3: Is there a relationship between gross income and customer ratings?

Answer 3: Trendline is pretty flat, so no significant relationship

Question 3.5: What is the relationship between branches and gross income?

Answer 3.5: No significant difference, median line is slightly higher in branch c

Question 3.6: What is the relationship between Gender and gross income?

Answer 3.6: Women spend slightly higher at 75th percentile than man

Question 4: Is there a noticeable time trend in gross income?

Answer 4: No significance because its only 3 months data, yet we see highs on valentinesday(feb 14) than other days.

Task 4: Dealing with Duplicate rows and Missing values

Task 5: Correlation Analysis

Completed